Compiling large-context phonetic decision trees into finite-state transducers
نویسنده
چکیده
Recent work has shown that the use of finite-state transducers (FST’s) has many advantages in large vocabulary speech recognition. Most past work has focused on the use of triphone phonetic decision trees. However, numerous applications use decision trees that condition on wider contexts; for example, many systems at IBM use 11-phone phonetic decision trees. Alas, large-context phonetic decision trees cannot be compiled straightforwardly into FST’s due to memory constraints. In this work, we discuss memory-efficient techniques for manipulating large-context phonetic decision trees in the FST framework. First, we describe a lazy expansion technique that is applicable when expanding small word graphs. For general applications, we discuss how to construct large-context transducers via a sequence of simple, efficient finite-state operations; we also introduce a memory-efficient implementation of determinization.
منابع مشابه
Compilation of Weighted Finite-State Transducers from Decision Trees
We report on a method for compiling decision trees into weighted finite-state transducers. The key assumptions are that the tree predictions specify how to rewrite symbols from an input string, and the decision at each tree node is stateable in terms of regular expressions on the input string. Each leaf node can then be treated as a separate rule where the left and right contexts are constructa...
متن کاملEfficient Development of Lexical Language Resources and their Representation
Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid construction of morphologic and phonetic lexico...
متن کاملDirect construction of compact context-dependency transducers from data
This paper describes a new method for building compact context-dependency transducers for finite-state transducer-based ASR ecoders. Instead of the conventional phonetic decision tree growing followed by FST compilation, this approach incorporates the honetic context splitting directly into the transducer construction. The objective function of the split optimization is augmented ith a regulari...
متن کاملAn efficient implementation of phonological rules using finite-state transducers
Context-dependent phonological rules are used to model the mapping from phonemes to their varied phonetic surface realizations. Others, most notably Kaplan and Kay, have described how to compile general context-dependent phonological rewrite rules into finite-state transducers. Such rules are very powerful, but their compilation is complex and can result in very large nondeterministic automata....
متن کاملAn Efficient Compiler for Weighted Rewrite Rules
Context-dependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as finite-state transducers (FSTs). We describe a new algorithm for compiling rewrite rules into FSTs. We show the algorithm to be simpler and more efficient than existing algorith...
متن کامل